-
Notifications
You must be signed in to change notification settings - Fork 28.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-23973][SQL] Remove consecutive Sorts #21072
Conversation
Test build #89367 has finished for PR 21072 at commit
|
*/ | ||
object RemoveRedundantSorts extends Rule[LogicalPlan] { | ||
def apply(plan: LogicalPlan): LogicalPlan = plan transform { | ||
case Sort(orders, true, child) if SortOrder.orderingSatisfies(child.outputOrdering, orders) => | ||
child | ||
case s @ Sort(_, _, Sort(_, _, child)) => s.copy(child = child) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for doing this! It might be useful to generalise this to any pair of sorts separated by 0 or more projections or filters. I did this for my SPARK-23975 PR, see: henryr@bb992c2#diff-a636a87d8843eeccca90140be91d4fafR322
What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, it makes sense. I will do, thanks.
Test build #89388 has finished for PR 21072 at commit
|
Test build #89395 has finished for PR 21072 at commit
|
retest this please |
val optimized = Optimize.execute(orderedTwice.analyze) | ||
val correctAnswer = testRelation.orderBy('b.desc).analyze | ||
comparePlans(optimized, correctAnswer) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add a test for three consecutive sorts? Two is the base case, three will help us show the inductive case :)
@@ -98,4 +98,31 @@ class RemoveRedundantSortsSuite extends PlanTest { | |||
val correctAnswer = groupedAndResorted.analyze | |||
comparePlans(optimized, correctAnswer) | |||
} | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you add a test which explicitly confirms that sort.limit.sort is not simplified? I know the above two tests cover that case, but it's good to have one dedicated to testing this important property.
* Removes Sort operation if the child is already sorted | ||
* Removes redundant Sort operation. This can happen: | ||
* 1) if the child is already sorted | ||
* 2) if the there is another Sort operator separated by 0...n Project/Filter operators |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: 'the there'
Test build #89413 has finished for PR 21072 at commit
|
@henryr thanks, I added the test cases you suggested :) |
Test build #89445 has finished for PR 21072 at commit
|
retest this please |
Test build #89449 has finished for PR 21072 at commit
|
anymore comments @henryr ? comments @cloud-fan ? |
|
||
def recursiveRemoveSort(plan: LogicalPlan): LogicalPlan = plan match { | ||
case Project(fields, child) => Project(fields, recursiveRemoveSort(child)) | ||
case Filter(condition, child) => Filter(condition, recursiveRemoveSort(child)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we should at least add ResolvedHint
. To easily expand the white list in the future, I'd like to change the code style to
def recursiveRemoveSort(plan: LogicalPlan): LogicalPlan = plan match {
case s: Sort => recursiveRemoveSort(s.child)
case other if canEliminateSort(other) => other.withNewChildren(other.children.map(recursiveRemoveSort))
case _ => plan
}
def canEliminateSort(plan: LogicalPlan): Boolean = plan match {
case p: Project => p.projectList.forall(_.deterministic)
case f: Filter => f.condition.deterministic
case _: ResolvedHint => true
...
case _ => false
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why do you think we should check for the filter condition and the projected items to be deterministic?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
by the definition of deterministic
, the entire input is the stats of the expression. It's very likely we will get a different result if we remove sort before filter, e.g. rowId() < 10
will get the first 10 rows, if you sort the input, the first 10 rows changed.
I think we should be conservative about deterministic expressions.
* Removes Sort operation if the child is already sorted | ||
* Removes redundant Sort operation. This can happen: | ||
* 1) if the child is already sorted | ||
* 2) if there is another Sort operator separated by 0...n Project/Filter operators | ||
*/ | ||
object RemoveRedundantSorts extends Rule[LogicalPlan] { | ||
def apply(plan: LogicalPlan): LogicalPlan = plan transform { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: now it's more efficient to do transformDown
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
isn't it the same?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
assume the plan is
Sort
Filter
Sort
Filter
Sort
OtherPlan
If we do transformUp
, we hit the rule 3 times, which has some unnecessary transformation(OtherPlan
is transformed 3 times). If it's transformDown
, it's one-shot.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, but I saw that transfrom
actually does transformDown
. Anyway, I see that this might change and here we best have transformDown
LGTM |
Test build #89721 has finished for PR 21072 at commit
|
Test build #89726 has finished for PR 21072 at commit
|
thanks, merging to master! |
What changes were proposed in this pull request?
In SPARK-23375 we introduced the ability of removing
Sort
operation during query optimization if the data is already sorted. In this follow-up we remove also aSort
which is followed by anotherSort
: in this case the first sort is not needed and can be safely removed.The PR starts from @henryr's comment: #20560 (comment). So credit should be given to him.
How was this patch tested?
added UT